Tagging obtained content for white and black listing

ABSTRACT

A system and method for providing enhanced security with regard to obtained files is presented. Upon obtaining a file from an external location, the obtained file is tagged with tagging information regarding the origin of the obtained file. Additionally, an operating system suitable for execution on a computing device is also presented. The operating system includes at least one application-callable function (API) for obtaining content from an external location. Each application-callable function for obtaining content from an external location is configured to associate tagging information with each obtained file, the tagging information comprising the origin of the obtained file. The origin of the obtained file can be used for subsequent security policy decisions, such as whether to allow or block execution or rendering of the content, as well as whether the content will be accessed in a constrained environment such as a “sandbox” or virtual machine.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation of U.S. patent application Ser. No.11/450,608, filed on Jun. 9, 2006, entitled “TAGGING OBTAINED CONTENTFOR WHITE AND BLACK LISTING”, which is a continuation-in-part of U.S.patent application Ser. No. 10/977,484, filed Oct. 29, 2004, entitled“EFFICIENT WHITE LISTING OF USER-MODIFIABLE FILES”, each of which arehereby incorporated by reference in their entirety.

BACKGROUND

An unfortunate reality of operating a computer, especially one connectedto a network, is that the computer is constantly under attack. Theseattacks come in a variety of forms including, but not limited to,computer viruses, worms, computer exploits (i.e., abusing or misusinglegitimate computer services), adware or spyware, and the like. Whilethe mechanism of operation for each of these various computer attacks isquite distinct, in general, they are all designed to carry out someunauthorized, usually unwelcomed, often destructive, activity on thecomputer. For purposes of the present invention, these attacks will begenerally referred to hereafter as malware.

As malware is a reality for computers generally, and for networkcomputers in particular, various tools have been devised and deployed toprevent malware from performing its malicious intent on a computer.These tools include firewalls, proxies, and security settings onvulnerable applications. However, the most commonly used tool inprotecting a computer against malware is antivirus software.

As those skilled in the art will appreciate, most antivirus softwareoperates as a pattern recognition service. In particular, when a file isreceived by a computer, irrespective of whether the file is anexecutable, word processing document, image, or the like, the antivirussoftware protecting that computer “analyzes” the file to determinewhether it is known to be malware. The antivirus software “analyzes” thefile by generating a hash value, referred to as a signature, for thefile. This signature is generated such that it is extremely unlikelythat another file will have the same signature, and is thereforeconsidered unique to that file. Once the signature is generated, thesignature is then compared against other signatures of known malware ina so-called signature file. Thus, if the file's generated signaturematches a signature of known malware in the signature file, theantivirus software has discovered the file to be malware and takesappropriate action.

Unfortunately, the signature recognition requires that the malware bepreviously known (and identified) in order to protect the computer fromthe malware. Thus, antivirus software is not a time-zero protection,i.e., protecting the computer from malware as soon as it is released onthe network, or time-zero. Instead, a vulnerability window exists duringwhich a new, unknown malware is released, and the time that antivirussoftware is able to protect a computer from the new malware.

FIG. 1 is a block diagram of an exemplary timeline 100 illustrating thevulnerability window associated with current antivirus software'ssignature recognition. As shown in FIG. 1, at some point in time, asindicated by event 102, a malicious party releases a new, unknownmalware onto a network, such as the Internet. Obviously, once the new,unknown malware is released, computers connected to the network are atrisk or vulnerable. Hence, the vulnerability window is opened.

While the actual time for detecting a new malware on a network dependson numerous factors, including the virulence of the new malware,according to available statistics, it generally takes between four hoursto three days for the antivirus software community, i.e., antivirussoftware providers, to detect or become aware of the new malware. Oncedetected, as indicated by event 104, the antivirus community can beginto identify the malware. In addition to generating a signature for thenew malware, identifying the malware also typically involvesresearching/determining the ultimate effect of the malware, determiningits mode of attack, identifying system weaknesses that are exposed bythe attack, and devising a plan to remove the malware from an infectedcomputer.

After having identified the malware, which typically takes approximatelyfour hours (at least for signature identification), an antivirusprovider will post an updated signature file on its download service, asindicated by event 106. Unfortunately, computers (either automaticallyor at the behest of the computer user) do not immediately update theirsignature files. It typically takes between four hours and one week formost computers to update their signature files, as indicated by event108. Of course, it is only after the updated signature file isdownloaded onto a computer that the antivirus software can defend thecomputer from the new malware, thereby closing the vulnerability window110. Indeed, depending on individual circumstances, such as when thecomputer owner is on vacation, updating a computer with the latestsignature files can take significantly longer than one week.

As can be seen, a new, unknown malware has anywhere from several hoursto several weeks to perform malicious havoc on the network community,unchecked by any antivirus software. Antivirus software is not time-zeroprotection. The good news is that most computers are protected before amalware tries to attack any one computer. Unfortunately, some areexposed during the vulnerability window and are infected by the malware.To most, especially those that rely heavily upon their computers, thisvulnerability window is entirely unacceptable.

Those skilled in the art will readily recognize that it is important togenerate a signature for a file such that the signature uniquelyidentifies the file that can be used to identify malware. Sophisticatedalgorithms and mathematics are involved with computationally generatinga signature that positively identifies a file and, at the same time,does not identify any other file. Unfortunately, in order to generate asignature that uniquely identifies the file, the algorithms used areextremely sensitive to the contents of the file. Any modification to afile will cause the signature generation algorithm to generate adifferent signature than for the original file. In other words, asimple, cosmetic change to a known malware will cause the signaturegeneration algorithm to return an entirely different signature. Thus, acosmetic change to a known malware (i.e., one identified by itssignature in a signature file) is usually sufficient to enable themodified malware to escape detection, at least until the modifiedmalware has been recognized, and its signature generated and stored in asignature file.

The problem of malware generally is compounded by the fact that malwareis often embedded in user modifiable files. For example, malware may bedisguised in and distributed as an executable script embedded within aword processing document. In these cases, the malware portion (i.e., theembedded script) is entirely unrelated to the editable portion of thedocument. Thus, modifications, small or large, to the data area of theword processing document will cause the complete malware file to yield adifferent signature than its original, while the embedded maliciousscript remains unaffected. These user-modifiable files include, but arenot limited to, word processing documents, spreadsheets, images, HTMLdocuments, and the like. Furthermore, malware creators, in order to stayahead of antivirus software detection, have begun creatingself-modifying malware: documents that randomly modify some portion ofthe file in order to remain undetected antivirus software. Clearly,then, in many cases, it is very difficult to stay ahead of the malwarethat is released, especially when malware must be known in order to bestopped.

Of course, as mentioned above, newly-released malware is not alwaysimmediately identifiable by any signature. For this reason, manycomputer users restrict the locations that they visit on the Internet totrusted or known locations, i.e., locations with which they arereasonably confident that the available content is malware-free. In thismanner, cautious users minimize their exposure to malware.Unfortunately, once a file is downloaded onto a user's computer, it isassumed that the file is safe for use (e.g., display, execution,editing, etc.) However, the mere presence of a file on a computer systemdoes not mean that the file is safe. Just as with visiting only trustedinternet locations, it would be beneficial if a user could, a priori,know the location from which certain content has been obtained. Armedwith the knowledge of the content's origin, a user can be cautious withregard to acting upon the a file (e.g., executing or displaying a file,installing a module on a computer, and the like.) Accordingly, as filesand/or content are obtained, they could be tagged with origininformation. Still further, it would be beneficial if a computer systemcould identify the location from which a file or content has beenobtained and act upon it according to its trustworthiness as identifiedin a set of predetermined rules, a white-list of trusted sites, and/or ablack-list of untrustworthy sites.

SUMMARY OF THE INVENTION

A computer system for providing enhanced security with regard toobtained files, the computer system is presented. The computer systemincludes a processor and a memory. Moreover, the computer system furtherincludes a file system. Upon obtaining a file from an external location,the computer system is configured to tag the obtained file with tagginginformation regarding the origin of the obtained file.

According to additional aspects, a method for enhancing the security ofa computing device with regard to a file obtained from an externalsource is presented. The method comprise first obtaining a file from anexternal source. Once or as the file is obtained from the externalsource, the obtained file is tagged with tagging information identifyingthe origin of the obtained file.

According to yet further aspects, an operating system suitable forexecution on a computing device having a processor and memory ispresented. The operating system comprises at least oneapplication-callable function for obtaining content from an externallocation. Moreover, each application-callable function for obtainingcontent from an external location is configured to associated tagginginformation with each obtained file, the tagging information comprisingthe origin of the obtained file.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing aspects and many of the attendant advantages of thisinvention will become more readily appreciated as the same become betterunderstood by reference to the following detailed description, whentaken in conjunction with the accompanying drawings, wherein:

FIG. 1 is a block diagram of an exemplary timeline illustrating thevulnerability window associated with antivirus software and,particularly, to signature recognition methods;

FIG. 2 is a block diagram illustrating an exemplary user-modifiabledocument;

FIG. 3 is a block diagram illustrating the exemplary user-modifiabledocument of FIG. 2, and for further illustrating that only certainsegments of the file are needed to develop a signature for the file;

FIG. 4 is a pictorial diagram illustrating an exemplary networkedenvironment suitable for implementing aspects of the present invention;

FIG. 5 is a block diagram illustrating an exemplary white list datastore suitable for use in the present invention; and

FIG. 6 is a flow diagram illustrating an exemplary routine suitable fordetermining whether a file is white listed as a trusted file accordingto aspects of the present invention;

FIG. 7 is a flow diagram illustrating an exemplary generate signatureroutine adapted according to aspects of the present invention;

FIG. 8 is a block diagram illustrating components of an exemplarycomputer system suitable for tagging obtained content and acting uponobtained content according to predetermined rules;

FIGS. 9A-9C are pictorial diagrams illustrating exemplary file systemimplementations with tagging information;

FIG. 10 is a flow diagram illustrating an exemplary tag content routinefor tagging obtained content;

FIG. 11 is an alternative flow diagram illustrating an alternativeexemplary tag content routine for tagging obtained content; and

FIG. 12 is a block diagram illustrating aspects of an exemplaryoperating system configured to automatically tag an obtained file withtagging information.

DETAILED DESCRIPTION

According to aspects of the present invention, rather than generating amalware signature based on the entire user-modifiable document, only aportion of a document is used as a basis for generating the signature.More particularly, a malware signature is generated based on certain,more permanent portions of a user-modifiable file. By basing the malwaresignature on those portions of a user-modifiable document that tend tobe more permanent, the ability of malware creators and self-modifyingmalware to escape detection through simple, cosmetic alterations issubstantially reduced, if not completely eliminated.

Those skilled in the art will appreciate that a user-modifiable documentincludes numerous elements, some of which tend to be more permanent thanothers. It is generally those more permanent elements/portions of thedocument upon which the present invention bases its signature. FIG. 2 isa block diagram illustrating an exemplary user-modifiable document 200and for discussing the various elements of the user-modifiable document.

As shown in FIG. 2, the user-modifiable document 200 includes variouselements/portions such as macros 202, templates 204, embedded objects206, such as Active X and COM objects, applied styles 208, and the like.Each of these elements tends to be more permanent, i.e., is not modifiedeach time a user edits the user-modifiable document. Additionally, theseare the types of document elements that contain the “core” of themalware. For example, malware creators embody their malicious designs inthe form of macros or Active X controls. These are then place inuser-modifiable files, such as word processing documents, spreadsheets,or images. Any information in the user data areas, such as user dataareas 210 and 212, typically have little or no effect on the malware perse, but often include information that would entice a user to activateand/or release the malware onto the unsuspecting user's computer. Thus,as already mentioned, due to the nature of current signature-baseddetection systems, variants of malware are easily produced throughcosmetic changes to the document.

It should be understood that while the present discussion may use theterm “user-modifiable” file, it is for description purposes only, andrepresents only one type of file applicable for the present invention.As mentioned above, quite often malware, distributed as applications,will include data areas whose modification does not affect thefunctionality of the malware. These data areas will be referred tohereafter as superficial data areas. User-modifiable files includesuperficial data areas, i.e., areas that a user (or embedded malware)may modify without affecting the embedded malware. Accordingly, itshould be understood that “user-modifiable” files or files withsuperficial data areas include all files that include data areas whosemodification affects the functionality of the malware (referred togenerally as the more permanent portions of the file) and areas whosemodification has no functional effect on the malware (referred generallyas user-modifiable data areas or as superficial data areas.)

FIG. 3 is a block diagram illustrating the exemplary user-modifiabledocument 200 and further illustrating that only portions of thedocuments are used in generating a signature for the document. Asmentioned above, according to the present invention, when generating afile signature, the more permanent portions of a user-modifiabledocument, such as, but not limited to, macros 202, templates 204, styles208, and embedded objects 206, are identified and used. Conversely, theuser data portions, such as user data areas 210 and 212, are filteredout of the signature generation process.

As mentioned above, even when basing malware signatures on morepermanent aspects of a user-modifiable file, malware detection does notalways provide time-zero protection, i.e., protection the moment amalware file is released. According to aspects of the present invention,in order to provide time-zero protection to a computer or network, filesthat are trusted not to be malware are identified on a so-called whitelist. As a file arrives at a computer, but before it can be utilized onthe computer, a signature for that file is generated and comparedagainst a white list of files that are known to be trusted. According tofurther aspects of the present invention, the signature of the file, ifthe file is a user-modifiable file, is based on its more permanentportions, as discussed above. In this manner, a user-modifiable file canbe edited and easily distributed among computers with full confidencethat distribution of the file is trustworthy. Conversely, those filesthat cannot be matched against signatures in the white list areconsidered untrustworthy, and security policies can be carried out toprotect the computer and/or network. In this manner, time-zeroprotection is realized.

According to the present invention, a white list may be locally storedon a computer, on a trusted network location, or both. The presentinvention is not limited to any one configuration or arrangement.Additionally, according to one embodiment, a computer may rely upon aplurality of white lists for a variety of reasons, including efficiencyand redundancy. FIG. 4 is a pictorial diagram illustrating one exemplarynetwork configuration 400 of a white list available to a plurality ofcomputers. As shown in FIG. 4, the exemplary network configuration 400includes a white list service 408 that receives requests from computers,such as computers 402-406, to identify whether a received file is whitelisted. The white list service 408 may be a Web server connected to theInternet 412, but the present invention is not so limited.

While the white list service 408 may be strictly a white listingservice, i.e., one that provides information as to files on a whitelist, alternatively, the white list service may provide information forboth white listed files as well as black listed files, i.e., knownmalware.

The white list service 408 is illustrated as being coupled to a whitelist data store 410. The white list data store includes those files thathave been identified as trustworthy files. In one embodiment, the whitelist data store 410 is a database of white listed files. While thepresent illustration identifies the white list service 408 and whitelist data store 410 as separate entities, it is a logical separation forillustration and discussion purposes. In an actual embodiment, the whitelist data store and the white list service may be incorporated as asingle entity, or as a service offered on a computer.

While in one embodiment, the white list data store includes onlysignatures of white listed files, the present invention is not solimited. Quite frequently, the level of trust that a number of files hasvaries between files. For example, a file known to have been created bya user may enjoy a high level of trust by that same user. Similarly, afile created by a trusted party, accompanied by a digital signatureattesting to its authenticity, may enjoy the highest level of trust.Alternatively, a file that has been quarantined in a so-called “sandbox”for several days, and that has not exhibited any signs of possessingmalware, may be “trusted,” but perhaps to a lesser degree than onedigitally signed by a trusted source. Yet another alternative is that aparticular file may receive positive feedback from users that it can betrusted. Such file may receive a trust level based on the volume offeedback regarding its trustworthiness, and be especially useful withregard to identifying spyware and adware. Thus, according to aspects ofthe present invention, the white list data store includes more than justfile signatures of “trusted” files.

While the preceding discussion of the present invention has been made inreference to a computer, it should be understood that the presentinvention may be implemented on almost any computing device, including,but not limited to, computers that have a processor, a communicationsconnection, memory for storing information, and being capable ofperforming file signature generation. For example, a suitable computingdevice may be a personal computer, a notebook or tablet computer, apersonal digital assistant (PDA), mini- and mainframe computers, hybridcomputing devices (such as cell phone/PDA combinations), and the like.

FIG. 5 is a block diagram illustrating exemplary fields that may existin a white list data store 410. In one embodiment, the white list datastore 410 will store a record for each white listed filed in the datastore, and each record includes one or more fields for storinginformation. As shown in FIG. 5, each record in the white list datastore 410 includes a signature field 502. The signature field stores thefile signature, whether or not the file signature was generated basedonly on more permanent portions of a file. As mentioned above, it isfrequently useful to identify the level of trust that a particular fileenjoys. Thus, the exemplary records also include a trust field 504. Asillustrated, the trust field includes a numeric value from 1 to 10, with10 representing the highest trust and 1 the lowest. However, it shouldbe understood that this ranking is illustrative only, and should not beconstrued as limiting upon the present invention. As yet a furtheralternative, the trust field 504 could also be used to identify malware.For example, if a file is assigned a trust level of 0, this could be anindication that the file is known to be malware.

Also shown in the white list data store 410 is an additional data field506. The additional data field 506, as its name suggests, includesinformation that may be useful to a user with respect to the whitelisted file. As shown in FIG. 5, the additional data field couldidentify the reasoning behind the assigned trust level of a file, suchas file originator or source, observed behaviors, lack of malwarebehaviors, and the like. Almost any pertinent information could bestored in the additional data field 506. Similarly, in alternativeembodiments, any number of fields could be included in the white listdata store 410.

FIG. 6 is a flow diagram illustrating an exemplary routine 600 fordetermining whether a file is white listed as a trusted file. Beginningat block 602, the computer receives an unknown/untrusted file, meaningthat the computer does not yet know whether the file is malware, orwhether it has been white listed. At block 604, a signature is generatedfor the received file. Generating a signature for the file is describedbelow in regard to FIG. 7.

FIG. 7 is a flow diagram illustrating an exemplary subroutine 700 forgenerating a file signature according to aspects of the presentinvention, and suitable for use by the routine 600 of FIG. 6. Beginningat decision block 702, a determination is made as to whether the file isa user-modifiable file. If the file is not a user-modifiable file, atblock 704, the exemplary subroutine 700 generates a signature for thefile based on the entire file. Thereafter, at block 710, the exemplarysubroutine 700 returns the generated signature and terminates.

If the file is a user-modifiable file, at block 706, the exemplarysubroutine 700 filters out the user-modifiable portions of the file. Atblock 708, the subroutine 700 then generates the file's signature basedon the remaining, unfiltered portions of the file. After havinggenerated the file's signature, at block 710, the exemplary subroutine700 returns the generated signature and terminates.

With reference again to FIG. 6, after having generated the file'ssignature, at block 606, the exemplary routine 600 connects with a whitelist service 408. As discussed above, the white list service may be alocal service/file installed on the computer or on a local area network,or alternatively, a remote white list service such as identified in FIG.4. Additionally (not shown), there may be a plurality of white listservices. For example, a white list service installed on the computermay contain a small number of file signatures that are frequentlyencountered by the computer. If a signature is not found in the localwhite list service, the computer may turn to a network white listservice that contains a larger number of signatures. Still further, if asignature is not found on either the local or network white listservices, a remote/global white list service, such as white list service408 of FIG. 4, may be consulted. Of course, the remote white listservice 408 will likely include only files that are globally available,such as help or service documents from an operating system provider.According to one embodiment, the local white list service is aware of,and in communication with, the network white list service, and thenetwork white list service is aware of, and in communication with, theremote white list service, such that a single request to the local whitelist service successively checks another if the file's signature is notfound.

After connecting with a white list service, at block 608, the routine600 submits the signature and obtains a trust level corresponding to thefile. At decision block 610, assuming the white list service alsoidentifies malware (though the present invention is not so limited), adetermination is made as to whether the file was identified as malware.If so, at block 612, the routine processes the malware according toestablished procedures. Processing malware is well known in the art, andincludes actions such as deleting the file, quarantining the file, orpurging the malware from the file. Thereafter, the routine 600terminates.

If the file is not identified as malware according to the trust levelobtained from the white list service 408, at block 614, the routine 600admits the file to the computer system according to established policiesrelating to the level of trust for the file. For example, if the trustlevel is at its highest, the computer user is likely satisfied that thefile is completely trustworthy, and can admit the file to the system forany purpose. Alternatively, if the trust level is fairly low, thecomputer system may be programmed to admit the file to the system withcertain constraints, such as, but not limited to, quarantining the filefor a period of time, executing the file within a so-called sandbox,disabling certain features network ability while the file operates, andthe like. After admitting the file to the computer system, the exemplaryroutine 600 terminates.

While the above described routine 600 includes a binary, i.e., yes/no,determination in regard to whether the file is or is not malware, in anactual embodiment, a number of determinations may be made according tothe trust level associated with the file. For example, a determinationmay be made as to whether the trust level is greater than a value of 8,such that any file with that level, or greater, of trust isautomatically admitted. Similarly, files with trust levels between 3 and7 may be required to execute within a so-called sandbox for some periodof time. Still further, files with trust levels below 3 must bequarantined before admittance to the computer system. Accordingly, theexemplary routine 600 should be viewed as illustrative only, and shouldnot be construed as limiting upon the present invention.

As indicated above, irrespective of the ability to generate a signatureon more permanent aspects of a file to identify potential malware, suchsignatures cannot always catch all malware. Thus, a computer user mustbe cautious by visiting trustworthy Web sites and only downloadingfiles/content known or trusted to be malware-free. This is especiallytrue as a tendency persists that once a file or content is downloaded toa user's computer, the file/content is presumed to be trustworthy andmay be displayed, executed, installed, or otherwise utilized on theuser's local computer system. This presumption is further exacerbatedbecause after a file or content is obtained, there has been nolegitimate way to determine its origin.

In this light, according to one embodiment, when a file is obtained froman external source (external to the local computer), the file is“tagged,” i.e., associated with information identifying its origin.Tagging information may comprise a variety of forms and informationincluding, but not limited to, a Uniform Resource Locator (URL) orUniform Resource Identifier (URI) of the file's origin, the author ofthe file, the domain from which the file was obtained, and the like.

While the following description is made with regard to obtaining filesfrom external sources, it is for illustration purposes only and shouldnot be construed as limiting in any manner. For example, the term “file”may be viewed to include files, content, modules, data streams, and thelike.

The term “obtaining” a file (or content) is used to denote more thanuser directed downloading of content from an external source/location.Of course, a user may obtain a file by directing an application, such asa Web browser application, to download a file to the user's localcomputer; but a user may also obtain files via e-mail, as a result of afile copy operation (initiated locally or externally), by recording adata stream, as a product of a system update operation, and the like. Inother words, obtaining a file refers to the addition of the file from anexternal source to the local computer, irrespective of the action thatinitiated the addition of the file to the local computer.

In regard to tagging obtained files, FIG. 8 is a block diagram ofexemplary components of a computer system 800 suitable for generatingsignatures for files (as described earlier) and/or for tagging obtainedfiles with tagging information. As shown, the exemplary computer system800 includes a processor 802 and a memory 804 communicatively connectedvia a system bus 806. The computer system 800 also includes a filesystem 808 (typically as part of an operating system, not shown) storingone or more files 810, including externally obtained files 812.

The computer system 800 is shown as including a white-list data store410 and a black-list data store 814. As discussed above, the white-listdata store 410 includes signatures of trusted applications, and mayfurther include tagging information corresponding to trusted locations,authors, sources, etc. In contrast, the block-list data store 814includes signatures of known malware, and may further include tagginginformation of untrustworthy locations, authors, sources, etc.

Also shown, the computer system may include an obtained files tag store816 and a rules data store 818. The obtained files tag store 816 storesinformation regarding files and their origins. The rules data store 818includes predetermined rules with regard to how to display or act upondownloaded files, based, of course, on its corresponding tagginginformation. Also, an anti-malware application 820 may optionally beincluded with the computer system 800 to validate whether or not a fileis malware and, as described in more detail below, to optionallymaintain the various lists of tagging information, trustworthy anduntrustworthy external locations/sources.

Tagging obtained files may be implemented in a variety of manners, byboth high level applications and/or low level system functions. Forexample, in order take advantage of established rules in regard toobtained files, each application that “obtains” files from externalsources could be made responsible for tagging the obtained file withorigin information. Thus, applications such as the Web browser, e-mailapplication, data streaming applications, remote file copy applications,and the like would each be required to tag a file, typically accordingto predetermined tagging requirements, as a file is obtained.Alternatively, file tagging may be embedded/incorporated into variousoperating functions such that file tagging is performed automaticallywhen obtaining content. For example, operating system API functions thatdownload or copy a file from a remote/external location could beenhanced to tag each file as part of the its download/copy process.Similarly, each file attached to an e-mail could be tagged with thesender's e-mail address when it is are retrieved from a remote location,or when the attached file is saved to the computer system. Moreover,when applications use various methods to obtain files from remotelocations, which methods bypass normal operating system functions to tagthe file, they would be responsible for tagging the file.

As mentioned above, the rules data store 818 contains rules with regardto displaying, executing, installing, or acting upon obtained files. Forexample, the rules may specify whether or not a particular image filedownloaded from a specific Web page may be displayed according to theWeb site's trustworthiness as established by the white-list andblack-list data stores. Similarly, rules may specify whether or not adownloaded application can operate freely on the computer system, shouldbe executed within a so-called “sandbox,” or should be completelyquarantined on the computer system. Of course, information in thewhite-list data store 410 and black-list data store 812 (as well as therules that use the information) may be updated as a user's confidence ina particular source (origin, domain, author, etc.) increases ordecreases, or as files from that origin prove to be trustworthy.Similarly, information as to trustworthiness, including the informationin the white-list data store 410 and black-list data store 812, and therules data store 818, may be updated or maintained by a third party,such as an anti-malware service installed on the computer or a systemadministrator. Still further, each of the various data stores(white-list, black-list, and rules) may be user-configurable to heightenor lower the levels of restrictions placed on certain obtained files, orselectively enabled/disabled by the user.

Of course, while various components of an exemplary system 800 have beenillustrated and described, these components should be viewed as logicalcomponents, not necessarily actual components. It should be appreciatedthat in an actual embodiment, the illustrated components may be combinedwith one or more other components, and/or with other components of atypical computer system that are not shown in FIG. 8. Similarly, thevarious data stores, including the white-list data store 412, theblack-list data store 814, the rules data store 818, and the obtainedfiles tag store 816, should be viewed as logical data stores, and in anactual embodiment, each of these may be implemented as one or moreseparate data stores, or may be combine into one or more larger datastores.

In regard to how obtained files are tagged, in most instances it isimportant that the file/subject matter is not modified. Frequently, butnot always, modification of the obtained file will invalidate itssuitability for its intended use. Accordingly, in many instances tagginginformation is associated with the content, and this association may beimplemented in a variety of fashions. To that end, on somefile/operating systems, such as Microsoft's NTFS file system, a singlefile is actually comprised of multiple data streams. For example, FIG.9A illustrates an exemplary file 900 in a file system where each filemay be comprised of one or more data stream, such as data streams902-906. As illustrated in FIG. 9A, file 900 comprises at least threeseparate data streams: a subject matter data stream 902, a securityrelated data stream 904, and a tagging information data stream 906.

In contrast to file systems supporting multiple streams for a singlefile, some file system are implemented as a database, where each filesis comprises of records and/or fields. Thus, in regard to FIG. 9B, in adatabase file system a given file 900 may be comprised of multiplerecords and/or fields, including a file content record 912, an accesscontrol list record 914, and a tagging information record 916. Therecords and fields of each file may be stored as contiguous ornon-contiguous data (as shown). Still further, some file systems are notparticularly well suited to easily associate tagging information withthe file in the file system. Thus, as shown in FIG. 9C and as analternative to a data stream or database file system, tagginginformation 922 could be stored separately from the obtained file 900,such as in an obtained files tag store 816. The obtained files tag store816 stores information associated an obtained file with tagginginformation for the obtained file.

Clearly, while various embodiments/implementations for storing tagginginformation have been described, there are numerous ways in whichtagging information may be associated with an obtained file.Accordingly, the above described implementations should be viewed asillustrative only, and not construed as limiting upon the presentinvention.

In regard to tagging obtained files and content, FIG. 10 is a flowdiagram illustrating an exemplary routine 1000 for tagging an obtainedfile and, optionally applying rules according to the file's origin.Beginning at block 1002, a file is obtained from an external location,i.e., external to the local computer system. At block 1004, the obtainedfile is tagged with the source location of the file. As described above,this may be done by the high level application that initiates obtainingthe file, or by low level functions (i.e., operating system services)called by the high level application to obtain the content, or acombination of both. Moreover, tagging information may be stored in analternate data stream, as a field or record associated with the file ina database file system, or in a obtained file tag store 816.

Not all obtained files are immediately acted upon (beyond simply storingthe file to the local computer system.) If no immediate action isrequired, the exemplary routine 1000 may terminate. However, quitefrequently a file is obtained for immediate action, such as displaying adownloaded image or Web page on the computer, or execution on thecomputer. Thus, after tagging the file with its source information(e.g., a path, URL, domain, author, etc.), the exemplary routine 1000optionally processes the obtained file according to predetermined rulesfrom a rules data store 818. More particularly, at block 1006 theexemplary routine 1000 determines the trustworthiness of the obtainedfile according to its tagging information and the information in thewhite-list data store 410, the black-list data store 812, and/or theanti-malware application 820.

Once the trustworthiness (or un-trustworthiness) of the file isdetermined, at block 1008, the obtained file is processed according aset of predetermined rules based on the trustworthiness particularly,and tagging information generally. For example, if, according to thetagging information, the obtained file originated from a source locationknown to frequently distribute malware, as defined in the black-listdata store 814 or by the anti-malware application 820, the predeterminedrules may dictate that the obtained file be quarantined, or executedwithin a so-called sandbox to limit any potential ill effects itsdisplay, execution, or installation may cause on the local computersystem. Similarly, if the obtained file is identified as a trustworthyfile, such as though information in the white-list data store 410,displaying, executing, installing, etc., may be carried out on the localcomputer system without restrictions.

With regard to the trustworthiness of an obtained file, various meansmay be employed to rate or establish the trustworthiness of an origin.For example, a value may be associated with an origin of files thatindicates the level of trustworthiness for files from that origin (e.g.,URI, author, domain, stream, etc.) The gradation of these values mayrange from a simple trust/no-trust value on up. For example, a gradingof values from 0 to 10, with 0 representing a non-trusted origin while10 represents a completely trusted origin. Moreover, when an origin isunknown (at least to its trustworthiness), some value such as 3 or 4 maybe used to indicate the unknown quality of this origin.

Of course, quite frequently, perhaps the majority of the time, anobtained file may not be identified as either trustworthy oruntrustworthy according to information in the black-list data store 814,the white-list data store 410, or from the anti-malware application 820.Simply put, the origin of the file is unknown as to whether or not it istrustworthy. However, even though a file's origin may not be evaluatedas trustworthy or untrustworthy, predetermined rules from the rules datastore 818 could be used to determine how, if at all, the obtained file(whose origin is not known) may be displayed, executed, or otherwiseused on the local computer system.

Once the obtained file has been processed, the exemplary routine 100terminates.

As an alternative to the above described routine 1000, an alternateexemplary routine 1100 for processing an obtained file is presented.Beginning at block 1102, a file is obtained from an external location.At block 1104, the obtained file and its origin are delivered to thecomputer system's anti-malware application 820. Similar to the processdescribed in regard to FIG. 10, the high level application thatinitiated obtaining the content from the external location may call theanti-malware application 820 with the obtained file and its origin, oralternative, calling the anti-malware application 820 with the obtainedfile and its origin may be integrated into the operating systemfunctions that are used to obtain the content.

At block 1106, the anti-malware application 820 persists/stores theobtained file's origin (i.e., “tags” the obtained file). Of course, thismay mean that the anti-malware application 820 stores the origin in analternate data stream, as a record in the database file system, or in anobtained files tag data store 816. Alternatively, while not shown, theanti-malware application 820 may persist the obtained file's origin in adata store accessible only to or by the anti-malware application 820. Infact, placing the obtained file's origin in a data store accessible onlyto the anti-malware application 820 could lead to greater security. Forexample, when tagging information is available generally, such as in analternate data stream, a field in a database, or a record in an obtainedfiles data store 816, a particular malware process may target thatinformation and corrupt it such that predetermined rules would allowthat file's execution when it would otherwise not be permitted. However,if the tagging information (i.e., the obtained file's origin) werelocated in a data store accessible only to the anti-malware application820, it would be that more difficult to corrupt and compromise thetagging information.

Assuming that immediate action is requested on the obtained file, theobtained file is optionally processed. At decision block 1108, adetermination is made as to whether the obtained file is malware. If theanti-malware application stores this information, determining theobtained file's trustworthiness is a matter of querying the anti-malwareapplication 820 regarding the obtained file. The anti-malwareapplication 820 then returns the obtained files trustworthiness.

If the obtained file is trustworthy, at block 1110 the file is processedaccording to the requested action, i.e., execution, display,installation, etc. Thereafter, or if the obtained file is nottrustworthy, the routine 1100 terminates.

As mentioned above, tagging an obtained file may be implemented at theoperating system level such that when a file is obtained, it isautomatically tagged. FIG. 12 is a block diagram illustrating aspects ofan exemplary operating system 1200 configured to automatically tag anobtained file with tagging information.

The illustrated operating system includes typical logical componentssuch as a file system component 1202, a memory management component1204, an operating system kernel component 1206, an applicationexecution component 1208, and a plurality of API functions 1210 that arecallable by executing applications.

Key API functions, such as copy 1212, URL download 1214, and the likeare configured to automatically tag each file, i.e., store the origininformation for each obtained file, such as storing the tagginginformation in the obtained files tag store 816, as indicated by arrow1216 of FIG. 12.

While a very simplified, logical set of operating system components havebeen shown in FIG. 12, it is for illustration purposes only, and shouldnot be construed as limiting upon the present invention. Clearly, thoseskilled in the art will appreciate that nearly all operating systems arevery complex system. However, as operating systems are known in the art,the simplification shown in FIG. 12 is to illustrate that variousfunctions offered by the operating system are configured toautomatically provide tagging information for each obtained file.

While various embodiments, including the preferred embodiment, of theinvention have been illustrated and described, it will be appreciatedthat various changes can be made therein without departing from thespirit and scope of the invention.

The embodiments of the invention in which an exclusive property orprivilege is claimed are defined as follows:
 1. A computer system forproviding enhanced security with regard to obtained files, the computersystem configured to: receive a user-modifiable file; tag theuser-modifiable file with tagging information, the tagging informationassociated with a source of the user-modifiable file; determine atrustworthiness of the user-modifiable file based at least in part onthe tagging information; and process the user-modifiable file based atleast in part on the trustworthiness.
 2. The computer system of claim 1,wherein the computer system is configured to act upon a receiveduser-modifiable file according to a rule in a rules data storecorresponding to a particular trustworthiness determined for thereceived user-modifiable file.
 3. The computer system of claim 1 furthercomprising a black-list data store comprising information regardinguntrustworthy sources, wherein the computer system is configured todetermine the trustworthiness by determining whether the source of theuser-modifiable file, as described in the tagging information, isincluded in the black-list data store.
 4. The computer system of claim 1further comprising a white-list data store comprising informationregarding trustworthy sources, wherein the computer system is configuredto determine the trustworthiness by determining whether the source ofthe user-modifiable file, as described in the tagging information, isincluded in the white-list data store.
 5. The computer system of claim1, wherein the computer system is configured to tag the user-modifiablefile with the tagging information by storing information regarding thesource as one or more separate data streams.
 6. The computer system ofclaim 1, wherein the computer system is configured to tag theuser-modifiable file with the tagging information by storing informationregarding the source as one or more records in a file system database.7. The computer system of claim 1, wherein the computer system isconfigured to tag the user-modifiable file with the tagging informationby storing information regarding the source in a tag store.
 8. Thecomputer system of claim 1, wherein the computer system is configured totag the user-modifiable file with the tagging information via ananti-malware component configured to store the tagging information in adata store accessible only to the anti-malware component.
 9. Thecomputer system of claim 1, wherein the computer system is configured toautomatically store tagging information for the user-modifiable file aspart of a function of obtaining the user-modifiable file.
 10. Thecomputer system of claim 1, wherein the computer system is configured togenerate the tagging information.
 11. A method for enhancing thesecurity of a computing device with regard to a file obtained from anexternal source, the method comprising: receiving a user-modifiablefile; tagging the user-modifiable file with tagging information, thetagging information identifying a source of the user-modifiable file;and determining whether to process the user-modifiable file based on atrustworthiness of the user-modifiable file ascertained from the tagginginformation.
 12. The method of claim 11 further comprising: processingthe user-modifiable file according to predetermined rules correspondingto the trustworthiness.
 13. The method of claim 11, further comprisingascertaining the trustworthiness by comparing the tagging information toa white-list of trustworthy sources.
 14. The method of claim 11, furthercomprising ascertaining the trustworthiness by comparing the tagginginformation to a black-list of untrustworthy sources.
 15. The method ofclaim 11, wherein the tagging further comprises storing the source ofthe user-modifiable file in an alternate data stream of theuser-modifiable file.
 16. The method of claim 11, wherein the taggingfurther comprises storing the source of the user-modifiable file via ananti-malware application.
 17. The method of claim 11, further comprisinggenerating the tagging information.
 18. A computer-readable storagedevice having encoded thereon instructions that facilitate a pluralityof acts, the plurality of acts including: obtaining a user-modifiablefile; tagging the user-modifiable file with source information, thesource information identifying an origin of the user-modifiable file;and determining whether to process the user-modifiable file based on atrustworthiness of the user-modifiable file ascertained from the sourceinformation.
 19. The computer-readable storage device of claim 18,wherein the plurality of acts further comprise generating the sourceinformation.
 20. The computer-readable storage device of claim 19,wherein the plurality of acts further comprise ascertaining the originbased at least in part on information regarding an external locationfrom which the user-modifiable file is obtained.